Goto

Collaborating Authors

 metropolitan area


These California metro areas are among the most AI-ready in the nation

Los Angeles Times

Despite suggestions it has been losing its edge, California is way ahead of others when it comes to the hottest technology right now: artificial intelligence. The regions around San Francisco, San José and Los Angeles are among the best prepped for AI in the country, according to a report released Wednesday by the Brookings Institution. The Washington think tank dubbed the San Francisco and San José metropolitan areas "superstars" when it comes to AI readiness. Three out of the top 10 city regions most ready for AI are in California, according to the report. No other state has more than one region in the top 10.


HouseTS: A Large-Scale, Multimodal Spatiotemporal U.S. Housing Dataset

Wang, Shengkun, Sun, Yanshen, Chen, Fanglan, Wang, Linhan, Ramakrishnan, Naren, Lu, Chang-Tien, Chen, Yinlin

arXiv.org Artificial Intelligence

Accurate house-price forecasting is essential for investors, planners, and researchers. However, reproducible benchmarks with sufficient spatiotemporal depth and contextual richness for long horizon prediction remain scarce. To address this, we introduce HouseTS a large scale, multimodal dataset covering monthly house prices from March 2012 to December 2023 across 6,000 ZIP codes in 30 major U.S. metropolitan areas. The dataset includes over 890K records, enriched with points of Interest (POI), socioeconomic indicators, and detailed real estate metrics. To establish standardized performance baselines, we evaluate 14 models, spanning classical statistical approaches, deep neural networks (DNNs), and pretrained time-series foundation models. We further demonstrate the value of HouseTS in a multimodal case study, where a vision language model extracts structured textual descriptions of geographic change from time stamped satellite imagery. This enables interpretable, grounded insights into urban evolution. HouseTS is hosted on Kaggle, while all preprocessing pipelines, benchmark code, and documentation are openly maintained on GitHub to ensure full reproducibility and easy adoption.


Uncovering Issues in the Radio Access Network by Looking at the Neighbors

Suárez-Varela, José, Lutu, Andra

arXiv.org Artificial Intelligence

Mobile network operators (MNOs) manage Radio Access Networks (RANs) with massive amounts of cells over multiple radio generations (2G-5G). To handle such complexity, operations teams rely on monitoring systems, including anomaly detection tools that identify unexpected behaviors. In this paper, we present c-ANEMON, a Contextual ANomaly dEtection MONitor for the RAN based on Graph Neural Networks (GNNs). Our solution captures spatio-temporal variations by analyzing the behavior of individual cells in relation to their local neighborhoods, enabling the detection of anomalies that are independent of external mobility factors. This, in turn, allows focusing on anomalies associated with network issues (e.g., misconfigurations, equipment failures). We evaluate c-ANEMON using real-world data from a large European metropolitan area (7,890 cells; 3 months). First, we show that the GNN model within our solution generalizes effectively to cells from previously unseen areas, suggesting the possibility of using a single model across extensive deployment regions. Then, we analyze the anomalies detected by c-ANEMON through manual inspection and define several categories of long-lasting anomalies (6+ hours). Notably, 45.95% of these anomalies fall into a category that is more likely to require intervention by operations teams.


Instruction-Tuning Llama-3-8B Excels in City-Scale Mobility Prediction

Tang, Peizhi, Yang, Chuang, Xing, Tong, Xu, Xiaohang, Jiang, Renhe, Sezaki, Kaoru

arXiv.org Artificial Intelligence

Human mobility prediction plays a critical role in applications such as disaster response, urban planning, and epidemic forecasting. Traditional methods often rely on designing crafted, domain-specific models, and typically focus on short-term predictions, which struggle to generalize across diverse urban environments. In this study, we introduce Llama-3-8B-Mob, a large language model fine-tuned with instruction tuning, for long-term citywide mobility prediction -- in a Q&A manner. We validate our approach using large-scale human mobility data from four metropolitan areas in Japan, focusing on predicting individual trajectories over the next 15 days. The results demonstrate that Llama-3-8B-Mob excels in modeling long-term human mobility -- surpassing the state-of-the-art on multiple prediction metrics. It also displays strong zero-shot generalization capabilities -- effectively generalizing to other cities even when fine-tuned only on limited samples from a single city. Source codes are available at https://github.com/TANGHULU6/Llama3-8B-Mob.


Urban Mobility Assessment Using LLMs

Bhandari, Prabin, Anastasopoulos, Antonios, Pfoser, Dieter

arXiv.org Artificial Intelligence

Understanding urban mobility patterns and analyzing how people move around cities helps improve the overall quality of life and supports the development of more livable, efficient, and sustainable urban areas. A challenging aspect of this work is the collection of mobility data by means of user tracking or travel surveys, given the associated privacy concerns, noncompliance, and high cost. This work proposes an innovative AI-based approach for synthesizing travel surveys by prompting large language models (LLMs), aiming to leverage their vast amount of relevant background knowledge and text generation capabilities. Our study evaluates the effectiveness of this approach across various U.S. metropolitan areas by comparing the results against existing survey data at different granularity levels. These levels include (i) pattern level, which compares aggregated metrics like the average number of locations traveled and travel time, (ii) trip level, which focuses on comparing trips as whole units using transition probabilities, and (iii) activity chain level, which examines the sequence of locations visited by individuals. Our work covers several proprietary and open-source LLMs, revealing that open-source base models like Llama-2, when fine-tuned on even a limited amount of actual data, can generate synthetic data that closely mimics the actual travel survey data, and as such provides an argument for using such data in mobility studies.


Big City Bias: Evaluating the Impact of Metropolitan Size on Computational Job Market Abilities of Language Models

Campanella, Charlie, van der Goot, Rob

arXiv.org Artificial Intelligence

Large language models (LLMs) have emerged as a useful technology for job matching, for both candidates and employers. Job matching is often based on a particular geographic location, such as a city or region. However, LLMs have known biases, commonly derived from their training data. In this work, we aim to quantify the metropolitan size bias encoded within large language models, evaluating zero-shot salary, employer presence, and commute duration predictions in 384 of the United States' metropolitan regions. Across all benchmarks, we observe negative correlations between the metropolitan size and the performance of the LLMS, indicating that smaller regions are indeed underrepresented. More concretely, the smallest 10 metropolitan regions show upwards of 300% worse benchmark performance than the largest 10.


When Dialects Collide: How Socioeconomic Mixing Affects Language Use

Louf, Thomas, Ramasco, José J., Sánchez, David, Karsai, Márton

arXiv.org Artificial Intelligence

The socioeconomic background of people and how they use standard forms of language are not independent, as demonstrated in various sociolinguistic studies. However, the extent to which these correlations may be influenced by the mixing of people from different socioeconomic classes remains relatively unexplored from a quantitative perspective. In this work we leverage geotagged tweets and transferable computational methods to map deviations from standard English on a large scale, in seven thousand administrative areas of England and Wales. We combine these data with high-resolution income maps to assign a proxy socioeconomic indicator to home-located users. Strikingly, across eight metropolitan areas we find a consistent pattern suggesting that the more different socioeconomic classes mix, the less interdependent the frequency of their departures from standard grammar and their income become. Further, we propose an agent-based model of linguistic variety adoption that sheds light on the mechanisms that produce the observations seen in the data.


Changes in Commuter Behavior from COVID-19 Lockdowns in the Atlanta Metropolitan Area

Santanam, Tejas, Trasatti, Anthony, Zhang, Hanyu, Riley, Connor, Van Hentenryck, Pascal, Krishnan, Ramayya

arXiv.org Artificial Intelligence

This paper analyzes the impact of COVID-19 related lockdowns in the Atlanta, Georgia metropolitan area by examining commuter patterns in three periods: prior to, during, and after the pandemic lockdown. A cellular phone location dataset is utilized in a novel pipeline to infer the home and work locations of thousands of users from the Density-based Spatial Clustering of Applications with Noise (DBSCAN) algorithm. The coordinates derived from the clustering are put through a reverse geocoding process from which word embeddings are extracted in order to categorize the industry of each work place based on the workplace name and Point of Interest (POI) mapping. Frequencies of commute from home locations to work locations are analyzed in and across all three time periods. Public health and economic factors are discussed to explain potential reasons for the observed changes in commuter patterns.


A machine learning approach to analyse ozone concentration in metropolitan area of Lima, Peru

#artificialintelligence

The main objective of this study is to model the concentration of ozone in the winter season on air quality through machine learning algorithms, detecting its impact on population health. The study area involves four monitoring stations: Ate, San Borja, Santa Anita and Campo de Marte, all located in Metropolitan Lima during the years 2017, 2018 and 2019. Exploratory, correlational and predictive approaches are presented. The exploratory results showed that ATE is the station with the highest prevalence of ozone pollution. Likewise, in an hourly scale analysis, the pollution peaks were reported at 00:00 and 14:00. Finally, the machine learning models that showed the best predictive capacity for adjusting the ozone concentration were the linear regression and support vector machine.


Matching tweets to ZIP codes can spotlight hot spots of COVID-19 vaccine hesitancy

#artificialintelligence

Public health officials are focusing on the 30% of the eligible population that remains unvaccinated against COVID-19 as of the end of October 2021, and that requires figuring out where those people are and why they are unvaccinated. People remain unvaccinated for many reasons, including belief in unfounded conspiracy theories about the disease, the vaccines or both; distrust of the medical establishment; concerns about risks and side effects; fear of needles; and difficulty accessing vaccines. To target their messaging and outreach geographically and according to the type of hesitancy, public health officials need good data to guide their efforts. Traditional survey methods are helpful but tend to be expensive. Another approach is to assess vaccine hesitancy through the lens of social media.